Action Selection Methods Using Reinforcement Learning 1 Action Selection 1.1 Multi-module Reinforcement Learning
نویسنده
چکیده
Action Selection schemes, when translated into precise algorithms, typically involve considerable design eeort and tuning of parameters. Little work has been done on solving the problem using learning. This paper compares eight diierent methods of solving the action selection problem using Reinforcement Learning (learning from rewards). The methods range from centralised and cooperative to decentralised and sellsh. They are tested in an artiicial world and their performance , memory requirements and reactiveness are compared. Finally, the possibility of more exotic , ecosystem-like decentralised models are considered. By Action Selection we do not mean the low-level problem of choice of action in pursuit of a single coherent goal. Rather we mean the higher-level problem of choice between connicting and heterogenous goals. These goals are pursued in parallel. They may sometimes combine to achieve larger-scale goals, but in general they simply interfere with each other. They may not have any terminating conditions. Typically, the action selection models proposed in ethology are not detailed enough to specify an algo-rithmic implementation (see Tyrrell, 1993] for a survey , and for some diiculties in translating the conceptual models into computational ones). The models that do lend themselves to algorithmic implementation) then typically require a considerable design eeort. In the literature, one sees formulas taking weighted sums of various quantities in an attempt to estimate the utility of actions. There is much hand-coding and tuning of parameters (e.) until the designer is satissed that the formulas deliver utility estimates that are fair. In fact, there may be a way that these utility values can come for free. Learning methods that automatically assign values to actions are common in the eld of Reinforcement Learning (RL) Kaelbling, 1993]. Reinforcement Learning propagates numeric rewards into behavior patterns. The rewards may be external value judgements , or just internally generated numbers. This paper compares eight diierent methods of further propagating these numbers to solve the action selection problem. The low-level problem of pursuing a single goal can be solved by straightforward RL, which assumes such a single goal. For the high-level problem of choice between connicting goals we try various methods exploiting the low-level RL numbers. In general, Reinforcement Learning work has concentrated on problems with a single goal. For complex problems, that need to be broken into subprob-lems, most of the work either designs the decomposition by hand Moore, 1990], or deals with problems where the sub-tasks have termination …
منابع مشابه
RRLUFF: Ranking function based on Reinforcement Learning using User Feedback and Web Document Features
Principal aim of a search engine is to provide the sorted results according to user’s requirements. To achieve this aim, it employs ranking methods to rank the web documents based on their significance and relevance to user query. The novelty of this paper is to provide user feedback-based ranking algorithm using reinforcement learning. The proposed algorithm is called RRLUFF, in which the rank...
متن کاملModular Q-learning based multi-agent cooperation for robot soccer
In a multi-agent system, action selection is important for the cooperation and coordination among agents. As the environment is dynamic and complex, modular Q-learning, which is one of the reinforcement learning schemes, is employed in assigning a proper action to an agent in the multi-agent system. The architecture of modular Q-learning consists of learning modules and a mediator module. The m...
متن کاملEvolutionary Computation for Reinforcement Learning
Algorithms for evolutionary computation, which simulate the process of natural selection to solve optimization problems, are an effective tool for discovering high-performing reinforcement-learning policies. Because they can automatically find good representations, handle continuous action spaces, and cope with partial observability, evolutionary reinforcement-learning approaches have a strong ...
متن کاملUsing the XCS Classifier System for Multi-objective Reinforcement Learning Problems
We investigate the performance of a learning classifier system in some simple multi-objective, multi-step maze problems, using both random and biased action-selection policies for exploration. Results show that the choice of action-selection policy can significantly affect the performance of the system in such environments. Further, this effect is directly related to population size, and we rel...
متن کاملCode-Specific Learning Rules Improve Action Selection by Populations of Spiking Neurons
Population coding is widely regarded as a key mechanism for achieving reliable behavioral decisions. We previously introduced reinforcement learning for population-based decision making by spiking neurons. Here we generalize population reinforcement learning to spike-based plasticity rules that take account of the postsynaptic neural code. We consider spike/no-spike, spike count and spike laten...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996